Knowledge of the perturbation design is essential for accurate gene regulatory network inference

The gene regulatory network (GRN) of a cell executes genetic programs in response to environmental and internal cues. Two distinct classes of methods are used to infer regulatory interactions from gene expression: those that only use observed changes in gene expression, and those that use both the observed changes and the perturbation design, i.e. the targets used to cause the changes in gene expression. Considering that the GRN by definition converts input cues to changes in gene expression, it may be conjectured that the latter methods would yield more accurate inferences but this has not previously been investigated. To address this question, we evaluated a number of popular GRN inference methods that either use the perturbation design or not. For the evaluation we used targeted perturbation knockdown gene expression datasets with varying noise levels generated by two different packages, GeneNetWeaver and GeneSpider. The accuracy was evaluated on each dataset using a variety of measures. The results show that on all datasets, methods using the perturbation design matrix consistently and significantly outperform methods not using it. This was also found to be the case on a smaller experimental dataset from E. coli. Targeted gene perturbations combined with inference methods that use the perturbation design are indispensable for accurate GRN inference.

. The precision-recall curves for data from the 100-gene GeneNetWeaver with (a,d,g,j,m) high, (b,e,h,k,n) medium, and (c,f,i,l,o) low noise levels for 5 datasets. Figure S2. The precision-recall curves for data from the 100-gene GeneSPIDER with (a,d,g,j,m) high, (b,e,h,k,n) medium, and (c,f,i,l,o) low noise levels for 5 datasets. Figure S3. The receiver-operating-characteristic (ROC) curves for data from the 100-gene GeneNetWeaver with (a,d,g,j,m) high, (b,e,h,k,n) medium, and (c,f,i,l,o) low noise levels for 5 datasets. Figure S4. The receiver-operating-characteristic (ROC) curves for data from the 100-gene GeneSPIDER with (a,d,g,j,m) high, (b,e,h,k,n) medium, and (c,f,i,l,o) low noise levels for 5 datasets. Table S1. Statistical significance from a two-tailed unpaired Mann-Whitney-U test of the differences between P-based and Non P-based methods, and between P-based methods provided with the correct and incorrect P-matrices, in terms of area under precision-recall (AUPR) curves. * denotes statistical significance with .  Table S3. Statistical significance from a two-tailed unpaired Mann-Whitney-U test of the method similarity where 10 P-based method pairs were compared to 10 Non P-based method pairs at each noise level. * denotes statistical significance with .

Data generation tool High Medium Low
GeneNetWeaver (N100) 1.299011e-04* 1.082509e-05* 1.082509e-05* GeneSPIDER (N100) 1.082509e-05* 2.165018e-05* 1.082509e-05* Figure S5. Accuracy of the GRN inference in terms of the F1-score from the 100-gene (a) GeneNetWeaver and (b) GeneSPIDER datasets. The x-axis represents different noise levels, and the y-axis denotes the maximum F1-scores calculated over different sparsities. Each method has five data points for each noise level for data generated from different true GRNs. The P-based and non P-based methods are represented by different markers and colors, and are highlighted together with blue and red, respectively. Figure S6. Accuracy of the GRN inference in terms of Matthew's correlation coefficient (MCC) from the 100-gene (a) GeneNetWeaver and (b) GeneSPIDER datasets. The x-axis represents different noise levels, and the y-axis denotes the maximum MCC levels calculated over different sparsities. Each method has five data points for each noise level for data generated from different true GRNs. The P-based and non P-based methods are represented by different markers and colors, and are highlighted together with blue and red, respectively. Figure S7. Sparsity of the GRNs with the maximum F1-scores (a-c) for the 100-gene GeneNetWeaver (GNW) data, and (d-f) for the 100-gene GeneSPIDER data at high noise (left panels), medium noise (middle panels) and low noise (right panels) levels. Figure S8. Sparsity of the GRNs with the maximum Matthew's correlation coefficient (MCC) levels (a-c) for the 100-gene GeneNetWeaver (GNW) data, and (d-f) for the 100-gene GeneSPIDER data at high noise (left panels), medium noise (middle panels) and low noise (right panels) levels.

Figure S9
. Average true link fraction in the overlap between the interactions predicted by the benchmarked methods across 5 100-gene datasets for each of the three noise levels, high (left column), medium (middle column), and low (right column) noise levels for GeneNetWeaver (a-c) and for GeneSPIDER (d-f) data. Figure S10. Accuracy of the GRN inference via the alternative benchmarking method where the self loops are removed from both the true and inferred GRNs of all methods in terms of the area under the precision-recall (AUPR) curve. Inference accuracy in terms of AUPR was averaged across 5 100-gene datasets from (a) GeneNetWeaver and (b) GeneSPIDER datasets. The circular x-axis represents different noise levels, and the y-axis denotes the AUPR levels calculated over different sparsities.  Figure S11. Accuracy in terms of area under the precision-recall (AUPR) curve from the DREAM5 E. coli (network3) subset data. Figure S12. Accuracy of the GRN inference in terms of area under precision-recall (AUPR) from the 250-gene (a) GeneNetWeaver and (b) GeneSPIDER datasets. The x-axis represents different noise levels, and the y-axis denotes the maximum MCC levels calculated over different sparsities. Each method has five data points for each noise level for data generated from different true GRNs. The P-based and non P-based methods are represented by different markers and colors, and are highlighted together with blue and red, respectively.

Supplementary Note 1.
Investigating the edge direction in GENIE3 GRNs and potential causes for the observed improvement when reversing the direction.
Before benchmarking, we tested all methods on their own published datasets, in most cases the in silico multifactorial data from the fourth round of DREAM network inference challenges to avoid potential bugs and/or uncertainties prior to GRN inference on our own generated data, and noticed a strange situation in GENIE3 GRNs. Even though we were able to reproduce the accuracy results from the DREAM4 in silico multifactorial challenge in terms of both AUROC and AUPR by using the original GENIE3 GRN format, the reverse edge direction greatly outperformed the original direction on our own 100-gene datasets generated via both GeneNetWeaver and GeneSPIDER. We investigated potential reasons for this situation, and observed that in most cases the reverse direction had a higher weight than the original one in the inferred GRNs, causing greatly improved accuracy when the output GRN matrix was transposed (Suppl. Fig. S14).

Bidirectional edges in GRNs
We investigated the fraction of bidirectional edges in both the true GRNs and the ones inferred by GENIE3. True GRNs from both GeneNetWeaver and GeneSPIDER have a sparse topology whereas GENIE3 infers a denser, fully connected, GRN consisting of weighted edges. A sparser GRN that has the exact number of interactions as the true GRN was selected, and the bidirectionality of the GRN was calculated by the fraction of bidirectional edges for which a nonzero value exists in both directions to the total number of edges in the GRN. As seen in Suppl. Fig. S13, GRNs inferred by Genie3 contain many more bidirectional links than the corresponding true GRN. Figure S13. Fraction of bidirectional edges to the total number of edges in the GRN, in the true and inferred GRNs by GENIE3 using data from the 100-gene (a) GeneNetWeaver, (b) GeneSPIDER or (c) DREAM4. True and inferred GRNs have the same sparsity. The x-axis represents the noise level of the gene expression dataset from which GENIE3 inferred GRNs, and the y-axis denotes the bidirectionality of the GRN.

Fraction of regulators in GRNs
We investigated the fraction of regulators in both the true GRNs and the ones inferred by GENIE3. About half of the genes in the 100-gene GeneNetWeaver true GRNs are regulators, and about 85% in GeneSPIDER (Suppl. Fig. S14a and b). As seen in Suppl. Fig. S14c and d, there is a strong bias in the link direction for medium and low noise, favoring the reverse direction. This in turn results in higher accuracy for the reverse direction (Suppl. Fig. S14e and f) in terms of the area under precision-recall (AUPR). Figure S14. Fraction of regulators to the total number of genes in the true and inferred GRNs using the 100-gene data from (a) GeneNetWeaver and (b) GeneSPIDER. The fraction of all interactions inferred by GENIE3 (fully connected GRN) that overlap with the true GRN where the reverse edge direction has a higher weight than the original is shown for the 100-gene data from (c) GeneNetWeaver and (d) GeneSPIDER. w r and w o refer to the edge weight in the reverse and original directions, respectively. The accuracy of the inferred GRNs, both in the reverse edge direction and the original, is shown in terms of area under precision-recall (AUPR) for the 100-gene data from (e) GeneNetWeaver and (f) GeneSPIDER. The x-axis represents the noise level of the gene expression dataset from which GENIE3 inferred GRNs.

Analyses on the DREAM4 in silico multifactorial network inference challenge data
The analyses performed for the GeneNetWeaver and GeneSPIDER data were repeated on the data from DREAM4 (Suppl. Fig. S15). In this data, the difference in accuracy for the two directions is very small, hence the link directionality in the inferred GRN is of little consequence for this particular dataset. Figure S15. Fraction of regulators and link direction effect on GRN inference accuracy in GENIE3 on the data from DREAM4. The fraction of regulators to the total number of genes in the true and inferred GRNs at the true sparsity, the fraction of all interactions inferred by GENIE3 (fully connected GRN) that overlap with the true GRN where the reverse edge direction has a higher weight than the original (w r and w o refer to the edge weight in the reverse and original directions, respectively), and the accuracy of the inferred GRNs, both in the reverse edge direction and the original, are placed on the x-axis, and their values are shown on the y-axis. The GRNs were inferred by GENIE3 from the five DREAM4 in silico multifactorial datasets.